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Open flow network is a weighted directed graph with a source and a sink, depicting flux distri¬ 
butions on networks in the steady state of an open flow system. Energetic food webs, economic 
input-output networks, and international trade networks, are open flow network models of energy 
flows between species, money or value flows between industrial sectors, and goods flows between 
countries, respectively. Flow distances (first-passage or total) between any given two nodes i and j 
are defined as the average number of transition steps of a random walker along the network from i 
to j under some conditions. They apparently deviate from the conventional random walk distance 
on a closed directed graph because they consider the openness of the flow network. Flow distances 
are explicitly expressed by underlying Markov matrix of a flow system in this paper. With this novel 
theoretical conception, we can visualize open flow networks, calculating centrality of each node, and 
clustering nodes into groups. We apply flow distances to two kinds of empirical open flow networks, 
including energetic food webs and economic input-output network. In energetic food webs exam¬ 
ple, we visualize the trophic level of each species and compare flow distances with other distance 
metrics on graph. In input-output network, we rank sectors according to their average distances 
away other sectors, and cluster sectors into different groups. Some other potential applications and 
mathematical properties are also discussed. To summarize, flow distance is a useful and powerful 
tool to study open flow systems. 

PACS numbers: 89.90.+n 


I. INTRODUCTION 

A large number of studies have proved that complex 
network is a powerful and useful tool to model com¬ 
plex systems [ll-Q. However, due to the limitation of the 
traditional graphs for describing the cormlexity of the 
various real systems, weighted networks B B , directed 
networks 0, bi-partite graphs jH, multiplex[9|,[T3, tempo¬ 
ral networks pT I as novel extensions of the conventional 
graphs emerge in the past decade. Among these, open 
flow network is a particular kind of directed weighted 
network to depict open flow system. 

Most complex systems are open, they exchange energy 
and material with their environment [ly . Energy and 
material flows are delivered to each unit of a system by 
the flow network [l3|. The distribution of these flows 
in the entire body of a system is described by directed 
weighted edges. Two special nodes “source” and “sink” 
are always added in the system to represent environment. 
Because the flow system considered is supposed to be in a 
steady state, the flow network is always balanced which 
means that the total inflow of each node equals to its 
total out flow except for the sink and the source. 

Energetic food web is a typical open flow network 
which has been studied for several years by system ecol¬ 
ogists. The seminal work of H.T. Odum [III, [TgI has 
depicted complicated energy flow transactions between 
two species as energy circuit. A bunch of indicators 
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have been proposed to quantify the properties of this 
open flow network pjl - l^j an d numeric common proper¬ 
ties have been discovered |2 ll425| . Patten et al. proposed 
a systematic “Ecological Flow Analysis” method to in¬ 
vestigate energetic flow networks . 

Indeed, many basic ideas and approaches of flow anal¬ 
ysis on energetic food webs inherit from the economic 
input-output analysis method which is first proposed 
by the famous economist Leontief [29|, . To quantify the 
complex economic production processes and the interac¬ 
tion between different economic sectors, an input-output 
matrix is calculated for an economic system to represent 
goods flows [32|. Following Leontief’s seminal work, 
Hanon introduced basic notions such as fundamental ma¬ 
trix to ecology for describing the energy flows between 
species [33|. Therefore, an input-output matrix can also 
be regarded as an open flow network. Money flow from 
the final demands compartment, circulate in different sec¬ 
tors of an economic system, and eventually flow to the 
value added compartment (or goods flow in an inverse 
direction). Thus, value-added compartment can be re¬ 
garded as the sink, and final demands can be regarded as 
the source. The money flow from industry i to industry j 
is always measured by the uniform currency unit, there¬ 
fore the total out flow from the source equals the total 
inflow to the sink, and is identical to the gross domestic 
output of an economy [3l|, [32|. Other exam ples of open 
flow networks include clickstream networks |34 [35| and 
trade networks [s^. In summary, open flow network is a 
very useful tool to depict various open flow systems. 

Distance on graph is a very useful concept [33. Both 
the shortest path distance [s^, resistance distance j39| and 
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the mean first-passage distance of a random walker jiol- 
can reflect the intrinsic properties of the graph. How¬ 
ever, conventional first-passage distance on a graph is 
based on the basic assumption that the whole network 
is closed, which means the random walker cannot escape 
from the network, thus the total number of walkers on the 
graph is conservative. Nevertheless, the open flow net¬ 
work is an open system. Random walkers can flow into 
the system from the source and flow out to the sink de¬ 
spite the total number of walkers staying in the network 
can be also conservative if the flow system is in a steady 
state. Therefore, the traditional method for closed sys¬ 
tem cannot be simply extended to open flow networks. It 
is necessary to extend the distance notions for open flow 
networks. 

This paper is organized as follows. In section nn the 
flow distance quantities from i to j are defined. The 
explicit form of each flow distance is expressed in sub¬ 
section iimi Sub-section Ed] shows how the distance 
matrix is calculated on an example flow network. In 
sub-section IIII Ai we apply our method to energetic food 
webs, visualize each species by its trophic level, and com¬ 
pare different distances on the food webs. The applica¬ 
tions of flow distances on input-output network including 
network visualization, sector clustering, and vertex cen¬ 
trality are introduced in sub-section IIII B1 Finally, we 
give a short summary for all the paper and the perspec¬ 
tive of flow distances in section IlYl 

II. FLOW DISTANCES 

In this section, we will present the definitions and cal¬ 
culations of flow distances. Three flow distances, namely 
first-passage flow distance, total flow distance, and sym¬ 
metric flow distance, are defined. They all can be ex¬ 
pressed by the Markov matrix of the open flow net¬ 
work. To obtain the final expressions, some intermediate 
concepts including total flow and first-passage flow are 
needed to be introduced. 


A. Definitions 

Consider an open flow network with N common nodes 
and two special nodes “source” denoted by 0 and “sink” 
denoted by A + 1 are added. An (A + 2) x (A+ 2) matrix 
F can be used to represent flows, and each entry 
where i, j G 0,1, 2, • • •, A + 1, represents the flow from 
node i to j. Note that the elements in the first column 
and the last row are all equal 0 because there are no 
inflow to the source and no out flow for the sink. We also 
define fi. = fij total out flow from i, and 

f.j = fij total inflow to j. In our research, 

the flow network should be balanced, which means that 
f.i = fi. for every node i except “source” and “sink”. 
Particularly, we name /i,Ar+i 5 the flow from i to the sink, 

as dissipation. 


Suppose a large number of particles flow along links 
in the network F, the directed flow fij from i to j is 
the total number of particles that jump from i directly 
to j along edge i ^ j in each time. The particles may 
jump from i to j along indirected paths, we define the 
first-passage flow from i to j denoted by (fij as the 
number of particles that reach j in each time step for 
the first time and have been visited i. And the average 
step that these particles have jumped is defined as the 
first-passage flow distance which is denoted by lij. 

Similarly, the total flow from i to j denoted as pij is 
defined as the total number of particles that have been 
visited i and arrive at j in each time no matter if it is the 
first time or not. And the average step that these parti¬ 
cles have jumped is defined as the total flow distance 
which is denoted by tij. 

To understand these quantities better, let’s consider 
the following imaginary experiment. Suppose all the par¬ 
ticles passing by node i are dyed red and this color would 
be washed out once the red particles arrive at node j for 
the first time. Then the first-passage flow from node i to 
j is the number of red particles passing by node j in each 
time. The first-passage flow distance is the average step 
that these particles have made. Similarly, if the particles 
passing by i are dyed red but this color would never be 
washed out, then the number of red particles that pass 
by j in each time is the total flow, and the average step 
that these particles have made is the total flow distance. 

In this paper, all the matrices are denoted by capital 
letters, and the their corresponding elements are denoted 
by lower case of the name of matrices. For example, F 
denote the flow matrix, and fij is the element of the ith 
row and jth column 


B. Calculation of total flow and first-passage flow 


Because the open flow system is in a steady state, and 
the flow network is balanced, we can define a Markov 
matrix M as follows. 


mu = 


sr^Nj-l n 

2^j=l Jij 


( 1 ) 


and mij represents the probability of particles jumping 
from state i to j. Note that Y^fJi^Triij = 1 for any i 
except A +1 because the elements in the last ((A + l)th) 
row are all zeros, this is a key difference between open 
flow network and closed flow network. 

According to reference 0 , no matter if circulations 
exist in network, the total flow from i to j can be calcu¬ 
lated as: 


where 


Pij — foi'^ij-! 




( 2 ) 

( 3 ) 
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is called fundamental matrix, it is also the inverse of 
M’s laplacian. And is the first-passage flow from the 
source to i which will be calculated in the following para¬ 
graphs. / is the identity matrix with size (A/' + l) • (A/' + l). 
We will provide an informal proof for Eq ([2]). 

Equation @ calculates the total flows along all possi¬ 
ble paths from ito j. When i ^ j, the number of particles 
that jump from i to j along all possible paths with k steps 
is Note that particles may flow back to i for 

several times, instead of f.i is adopted because f.i 
contains the flows back to i. If the particle passing by i 
is dyed, then is the number of particles without color 
marker and will be dyed in each time. Taking summation 
of (j){)i{M^)ij from /c = 1 to oo, we can obtain the total 
flow from i to j along all possible ways. According to the 
series expansion MU = M{I — M)~^ = M -j- + • • • 

and the identity {MU)ij = uij when i ^ Eq. ([2]) is 
obtained. 

When i = j, according to p’s definition, pa should con¬ 
tain the first-passage flow from the source to i, therefore. 
Pa = (t){)i{{MU)ii^l) = (t){)i{{MU)ii^Iii) = (t)oiUii, then 
Eq. ([2]) holds. 

Because the total flow from i to j can be divided 
into two different categories, one is the first-passage flow 
which contains the particles that arrive at j for the first 
time, the other is the circulation flow which contains the 
particles that arrive at j more than once. All the flows 
are conditioned on starting from i. We know that the 
circulation flow is the summation of flows from j to j 
along all possible paths, it is calculated as 

oo 

V’ii = <kij (E ’ (4) 

k=l 


C. Calculation of flow distances 


We can deduce the explicit expression of various flow 
distances once the total flow and first-passage flow ex¬ 
pressions are given. Eirst, according to the definition of 
the total flow from i to j along all possible paths, we have 

oo 

(9) 

k=l 


where p^j denotes the probability that particles trans¬ 
fer from i to j after k steps. One may think = 
however, it is not true because p^j is nor¬ 
malized for all paths with all possible lengths /c, i.e., 
J2'^=iPij = 1- However, (M*)^ is normalized for all 
js, i.e., {M^)ij = 1. We know that the flow from 

i to j after k steps is and the total flow along 

all possible paths is p^j, therefore. 


k _ 

^ . . 


( 10 ) 


Thus bring this equation to Eq. m, we have. 


^ij — / ^ — 


Pij 

Uij 


( 11 ) 


where ipij represents the circulation flow starting from i. 

Therefore, the total flow from i to j can be expressed 
as[l^ 

Pij — 4^ij T — 4^ij'^jj' (5) 

Thus, we obtain the expression for the first-passage flow 
from i to j: 


In which, we have used the following series expansion: 

2 oo 

= ^kM^. (12) 

k=l 

Similarly, we can obtain the expression for first-passage 
flow distance. Eirst, according to the definition of the 
first-passage distance from i to j, we have 


MU^ = M 


1 


I -M 


(t)ij = 7^- (6) 

Based on the equations of Eq. m and Eq. and 
note that 0oo = /o- according to the definition, where /q. 
denotes the total flow from “source” to the whole system, 
we have 

Pij — YOi'^ij — — JO- '^ij ' {•) 

'^ii 

And the explicit expression for the first-passage flow is 


oo 

(13) 

k=l 

where qfj denotes the probability that particles started 
from i to j after k steps in the first time. One cannot 
use Pij (Eq. (pd)|) ) because it contains the circulation flow 
from j to j. Let us assume that all the particles arriving 
at j will be removed from the system, that is to say, we 
assume that j is another sink, then all the calculations 
for the total flow distance is correct. To make this point 
clear, we define a new matrix M_j as: 


(t^ij 


pij 




/o- 


^Oi^ij 

'^ii'^jj 


(8) 


{^—j)rs — 


{ 


Tflrps 1 

0, 


r = j. 


(14) 
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And the correct expression for the probability qfj is 




Insert it into Eq. m, we have 


(15) 


j _ j ( ^-j ) b' 

Lij — - . 


(16) 


According to the Theorem 1 proved in the Supplemen¬ 
tary Material, when Uij 7^ 0 (i connects to j), this for¬ 
mula can be reduced to 


Hj — 




= ti 


^33' 


(17) 


^33 


Therefore, the difference between Uj and lij is just the 
total flow distance from j to j. The matrix 
have identical rows. Therefore, we can use a vector tj to 
abbreviate the matrix tjj^ it quantifies the ability of self 
circulation of each node in the system. 

All the flow distances introduced above are asymmet¬ 
ric, however, some real tasks such as nodes clustering, 
computation of node centrality require symmetric met¬ 
rics. Commute distance [42, is a classical and famous 

symmetric distance measure defined by random walk, 
which is calculated by lij + Iji. However, this definition 
cannot work when one of lij or Iji is infinity meaning that 
i cannot access j or vice versa. Therefore, we define a 
new symmetric flow distance to avoid this problem: 


_ ^ 1 _ ‘^lijlji 

Cij - A ^ . - J J 

l- . ^ 1.. ^^3 ^ t 


(18) 


U 


'3% 


We call Cij symmetric flow distance, it is a mixing of lij 
and Iji. Suppose lij = 00, then Cij = 21 ji which is well- 
defined. When lij = Iji^ Cij = kj = Iji. Therefore, Cij is 
a reasonable symmetric distance. 


D. Calculation on an example network 


Before applying our method to real open flow networks, 
we would like to present the computations of flow dis¬ 
tances on a small example network and compare with 
other distances on graph. The example network is shown 
in Figure [H There are 7 nodes including the source and 
the sink. All the flows are denoted on the edges. We 
present the first-passage flow distances matrix L in the 
following equation 


■ 0 1 2.15 2.27 3.20 3.40 3.94" 

oc 0 1.15 1.27 2.20 2.40 2.94 

00 oc 0 2.25 1.05 1.25 2.13 

oc oc 1 0 2.05 2.25 1.61 

oc oc 3 2 0 1 1.60 

oc oc 2 1 3.05 0 1.20 

oc oc oc oc oc oc 0 


(19) 



FIG. 1. An example open flow network 


and the total flow distances matrix T 


0 

1 

2.23 

2.35 

3.23 

3.48 

3.94' 

oc 

0 

1.23 

1.35 

2.23 

2.48 

2.94 

oc 

oc 

0.08 

2.33 

1.08 

1.33 

2.13 

oc 

oc 

1.08 

0.08 

2.08 

2.33 

1.61 

oc 

oc 

3.08 

2.08 

0.03 

1.08 

1.60 

oc 

oc 

2.08 

1.08 

3.08 

0.08 

1.20 

oc 

oc 

oc 

oc 

oc 

oc 

0 


Note that there are many oc entries in both L and T 
because the corresponding node pairs have no connected 
path. Another interesting phenomenon is all the elements 
in T are larger than the corresponding entries in L. And 
the difference (T — L) is: 


"0 0 0.08 0.08 0.03 0.08 O' 

0 0.08 0.08 0.03 0.08 0 

0.08 0.08 0.03 0.08 0 

0.08 0.08 0.03 0.08 0 

0.08 0.08 0.03 0.08 0 

0.08 0.08 0.03 0.08 0 

0 


( 21 ) 


The empty entries have no numeric value because oc — 
oc is indefinite. All the elements in the same column are 
identical which are the average flow distances from i to 
i ioY i = 2, 3,4, 5. And because 2,3,5 are in the same 
cycle 2 ^ 5 ^ 3 and 2 ^ 4 ^ 5 ^ 3, they have the 
same values of tjj. 

Next, we compare our first-passage flow distance kj 
with shortest path distance and first-passage distance 
based on random walks [dlj on the closed version of the 
same network. In the latter comparison, “source” and 
“sink” are excluded so that the network is closed. For 
the random walkers in the closed network, the transi¬ 
tion probability between i and j is the fraction between 
fij and {fi. — /i,Ar+i), the total out flows from i exclud¬ 
ing the dissipation from i . For example, the transition 
probability from 2 to 4 is 20/(20 + 30) = 0.4 but not 
20/(20 + 30 + 10) = 0.33. The results are shown in Table 

HI 
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TABLE L Comparisons among three kinds of distances on 
selected node pairs 



1^3 

2^3 

1^4 

2^4 

Shortest path 

1 

2 

2 

1 

Closed FPD 

2.5 

2.4 

6.875 

5.5 

Open FPD 

1.274 

2.25 

2.2 

1.055 


As we expected, shortest path lengths are much shorter 
than the other two distances because they only con¬ 
sider the shortest paths, as a result, this distance al¬ 
ways under-estimates the distances of random particles 
in flows. Comparing two first-passage distances is more 
interesting. Closed First Passage Distance (Closed FPD) 
is always larger than Open First Passage Distance (Open 
FPD) because dissipations are not considered in Closed 
FPD. For example, the Closed FPD from 1 to 4 is larger 
than the Open FPD in almost 3 times because the longer 
circulated path 2 ^ 5 ^ 3 ^ 2 has much higher proba¬ 
bility when the dissipation from node 2 is neglected (0.6 
rather than 0.5). 


III. EMPIRICAL STUDIES 

In this section, we will apply our flow distances on two 
kinds of networks: 18 energetic food webs (energy flow 
information between species is included) and the input- 
output network of U.S. The raw data of food webs is from 
the following open data source 0. And the input-output 
network data is from|45j. 


A. Food web 

Trophic level is an important concept in food webs, it 
characterizes one species’ distance from the source (sun 
light) along the food chain. However, when the food web 
is an entangled network, calculating the shortest path 
from the source always under-estimates the trophic level 
of a given species because non-shortest paths may have 
much longer distances from the source. Therefore, we 
quantify trophic levels of different species W the concept 
of first-passage distance from the source |^. that is /oi 
for any species i. This distance is reasonable because 
it contains the information of weights and all possible 
energetic path ways from the source. Figure [2la visual¬ 
izes the trophic levels of 125 biological species in Bay dry 
food web. Producer species locate at the area close to 
the source, and higher level consumers locate in the pe¬ 
ripheries. 

Next, we calculate several distances in the level of the 
entire network. The first distance is the first-passage 
distance from the source to the sink (/o,Ar+i)- This dis¬ 
tance quantifies the average number of steps of a random 
particle in its all life span. The second distance is the 
mean value of the elements in matrix L except for the 


infinite elements. We calculate these distances for all the 
collected energetic food webs, and to observe how the 
distances change with network size. 

Figure [3] shows various distances change with number 
of edges of networks. We find that the average value of 
lij has similar trend with the average path length /o,Ar+i 
from the source to the sink. Shortest path length is al¬ 
ways shorter than the average I and /o,Ar+i because it 
does not consider the average behaviour of random walk¬ 
ers. There is a slightly trend that the network lengths 
increase with network size. 


B. Input-output network 

Input-output network is another kind of flow network. 
Each industrial sector corresponds to a vertex, and an 
input from one sector to another can be considered as a 
flow. However, there are two kinds of views to represent 
an input-output network as a flow network. If we consider 
material flow, then the input from sector i to j should be 
understood as a flow from i to j. However the flow may 
be from j to i if money flow is considered. We adopt the 
viewpoint of money flow in this paper because the flow 
of money in different sectors resembles random walkers 
in open flow networks. In this way, the final demand 
sector is the source of money flows, and the value added 
sector is the sink. We choose the input-output data from 
United States in 2000 as an example to calculate various 
flow distances. 

First, it is curious to calculate the economic “trophic 
levels” (/o,i) of different sectors (see Figure [2lb). The 
sectors with shorter distances from the center are closer 
to the source, therefore they are more easily to be affected 
by the final demand. Any fluctuations of demands or 
price can be transferred to the sectors with lower “trophic 
levels”. 

Second, we can use flow distances to calculate similar¬ 
ity between different sectors. Because it is much easier to 
deal with symmetric similarity, we use the distances Cij 
instead of kj here. With this symmetric measure, we can 
cluster sectors W using the standard hierarchical cluster¬ 
ing techniques 1^. The result is visualized by Figure IH 
In this figure, similar or related sectors are gathered 
closely, like Public admin and Health & socialwork^ 
Ming and Fuel. We also find that Real estate sector 
is close to Finance sector, which means real estate has 
tight relation with finance in U.S. The clustering results 
have good agreement with our common sense of indus¬ 
trial sectors. 

Furthermore, the symmetric measure Cij can be used 
to measure the centrality of each node because if Ts av¬ 
erage Cij for different j is shorter then i must have tight 
connections with all other nodes. Formally, we define the 
centrality of node i as 

c- 
- 


( 22 ) 
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FIG. 2. Trophic levels of species in Baydry community (a) and industrial sectors of U.S. input-output network in 2000 (b). The 
polar radii, i.e., the distances between every node and the center are proportional to nodes’ trophic levels, the polar angles are 
randomly assigned, and the sizes of nodes are proportional to the logarithmic volumes of the total throughflow for each node 
(fi-). The colors are assigned randomly. 



FIG. 3. Three kinds of distances for all collected energetic 
food webs. All food webs are sorted according to their number 
of edges in an increasing order 


Thus, the shorter is i’s cij, the more central position 
it has in the whole economic system. We color differ¬ 
ent nodes in Fig. |4]by q. The color depth increases as 
Ci decreases. We find that Trade and Public admin. 
sectors are more central than other sectors in U.S., and 
Agriculture and Ming sectors are less important than 


the average. 

Finally, we calculate the vector ti for all i. It is defined 
as the average steps of a random walker who starts from 
i and finally returns to i again. This measure indicates 
the re-cycle capability of a sector in the sense of money 
flow. Therefore, less ti implies larger capability of self¬ 
maintenance of this sector. In Table im we show the top 
5 and bottom 5 sectors in the decreasing order of ti in 
the United States. 


TABLE II. List of sectors sorted by ti 


Rank Sectors in USA 

1 

Motor vehicles, trailers and semi-trailers 

2 

Finance and insurance 

3 

Basic metals 

4 

Ghemicals and chemical products 

5 

Agriculture, hunting, forestry and fishing 

32 

Electricity, gas and water supply 

33 

Hotels and restaurants 

34 

Gonstruction 

35 

Education 

36 

Health and social work 


The top five sectors are more likely connected to other 
sectors in the economy. Through analysing the flux ma¬ 
trix F, we find that they have less fractions of flows from 
the source or to the sink. On the contrary, the last five 
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FIG. 4. Hierarchical clustering of different industrial sections 
in U.S. Colors represent node centrality. All the sector names 
are abbreviated, and full names can be referred to the Sup¬ 
plementary Material. 

sectors are all major in providing services or products for 
final demand. 


IV. DISCUSSION 

In this paper, we introduce flow distances in various 
open flow networks. These distances characterize inter¬ 
actions between different nodes, and all the distances can 
be expressed explicitly by the Markov matrix. We give 
several examples on potential applications of flow dis¬ 
tances on energetic food webs and input-output network. 
Trophic level as an important conception introduced in 
food web ecology should be applied to other open flow 
networks. Usually, the nodes with lower trophic levels 
are more probable to be influenced by the source easily. 
Second, we can use flow distances to cluster nodes be¬ 
cause the symmetric distance Cij can be regarded as a 
kind of similarity measure. We also use Cij to compare 
node centrality between different nodes. Vector ti can 
be used as an indicator to compare the in-dependency 


of different node. Because all the flow distances reflect 
the nature of random walk in an open flow network, they 
combine the topology and flow dynamics on the network 
together. Therefore, these distances must have very wide 
application background. 

Certainly, the applications of flow distances should not 
be limited by the examples listed in this paper. First, 
open flow networks are combinations of network struc¬ 
ture and random walk dynamics, thus visualizing these 
networks needs particular method. Besides placing dif¬ 
ferent nodes on a space by their “trophical levels” di¬ 
rectly, we can embed the flow network into a Euclidean 
space according to distance Cij such that the Euclidean 
distance of any given pair i and j is as close as their 
Cij. This embedding problem can be solved by optimiz¬ 
ing the places of each node in the Euclidean space. And 
the patterns of the nodes distributed in the space may 
help us to understand the flow network structure in an 
intuitive way. However, how to visualize the open flow 
networks to reflect the characteristics of the directional¬ 
ity and weights of edges is another important issue de¬ 
serving for further studies. Second, open flow networks 
always resemble tree structures that are hierarchical and 
possessing multi-level structures. How to partition a flow 
network into several smaller sub-structures, and how to 
coarse-grain these structures is also an interesting prob¬ 
lem. It is reasonable to develop a novel method based 
on flow distances discussed in this paper to partition and 
coarse-grain. Third, the flow distances metrics and net¬ 
work embedding can help us to understand some under- 
lyin g dy namical processes on the network in a geometric 

way[33- 

Elow distances can obviously applied to other open 
flow networks, and may facilitate us to compare them. 
Trade flow network, traffic flow network, attention flow 
networks are all very important examples. Application 
of flow distances on these networks may reveal important 
common patterns. 

The current flow distances metrics also have shortcom¬ 
ings. The computational complexity will increase fast as 
the size of the network because the matrices U and L are 
non-sparse when the network is large. Therefore, the ap¬ 
proximate algorithm of flow distances is very necessary 
and urgent. Additionally, all the flow distances metrics 
are average values of various paths of particles, the vari¬ 
ances of these paths cannot be reflected on these metrics. 
New indicators are needed to represent the fluctuations 
of different paths. All these problems deserve further 
studies. 
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I. PROOF OF A THEOREM 

In this appendix, we will prove Eq. (17). But be¬ 
fore that, several lemmas are needed to be proved at first. 

Lemma 1: The following equation is true: 

/ = (/ - M)U. (1) 

Proof: It is obvious according to the definition of 
U = (/-M)-b 


^U-U^d = {M_d + AM)U - M_dU^d 
= M_d{U -U^d)T AMU 

^ (/ - M^d){U - U-d) = AMU ( 10 ) 

U_d{I-M_d) = I ( 11 ) 

U - U-d = U-d • AM . [/ (12) 


Lemma 2: The following equation is true: 

{I - M)U = {I - M-d)U-d = L (2) 

Where, M-d is the matrix when the dth row of matrix 
M is set to zero. Thus, 

M = M-d + AM, (3) 

where 

Correspondingly, U-d is 

C/_d=J + M_d + M!rf + --- = (J-M_rf)-r (5) 

Proof: This is also obvious according to Lemma 1. 


Lemma 3: The following equation holds for any d,i,j 
belongs to [1, A]: 


{U—d)ij — Uij {U—d)id'^dj 


Udd 


Udj 


'^id 

Udd 


ddj- 


(6) 

( 7 ) 


Where, 


6ij — 


1 , ^ = i 
0, i^j 


( 8 ) 


According to the definition of AM, and according to the 
fact that 


^ ^ UfljkUkj — Uij dij, 
k 

we can expand U-d • AM -U as 

U-d ' {AM • U)ij = {U—d)id ’ ^ ^ UfldkUkj 

k 

— {U—d)id{Udj ^dj)' 

So, we get 

Uij {U—d)ij — {U —d)id{Udj ^dj^ 

In the above equation, if we let j = d, then 

Uid {U—d)id ~ {U —d)id{Udd 1 ) 

Thus, 

(TT \ 

{C-d)id — —• 

Udd 

Insert it into Equation m, we have 

Uij — {U-d)ij = {udj — ddj)- 

Udd 

At last, rearrange this equation, we obtain 


(13) 

(14) 

(15) 

(16) 

(17) 

(18) 


Proof: According to Lemma 2, we have 


fTT ^ Uid J’ 

\C—d)ij — Uij Udj ^dj’ 

Udd Udd 


(19) 


U — MU = U — {M-d + AM)U = U-d — M-dU-d (9) Lemma 4: Based on these lemmas, we can get such 

equation: 
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Ujj u‘jj Ujj 


( 20 ) 
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Proof: Expand U‘^ into elements, and substitute Lemma can be proved. 
3 into it, then 




E ( Uij Uij . U]^j 

r. ^33 ^33 ^33 


E ^ik ^kj ^ij . ^ij 

~3~. L + — 

k 

_ /rr2 




'33 


^33 


^33 


m33 + 


( 21 ) 


^33 


Theorem 1: Equation (17) in the main text or the 
following equation 




Uij "( 


(U )ij Uij 


^33 


{U% 

Ui 




33 


33 




^33 


{U% 


iMU^)33 


(23) 


^ [{MU%j - ujj .)ii] = (22) 

Uij Ujj 

holds when Uij ^ 0. 

Proof: Substitute M • - 1/ and 

M-j • U‘^j = U^j — U-j into Lemma 1 and Lemma 3, it 


II. NAME LIST FOR SECTORS OF 
INPUT-OUTPUT NETWORK 

Sector names in Figure 4 are abbreviated. The full 
names corresponded are depict in Table [Tl 
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TABLE L Sector full names 


Abbreviations 

Full names 

Agriculture 

Agriculture, hunting, forestry and fishing 

Mining 

Mining and quarrying 

Food 

Food products, beverages and tobacco 

Textiles 

Textiles, textile products, leather and footwear 

Wood 

Wood and products of wood and cork 

Pulp 

Pulp, paper, paper products, printing and publishing 

Fuel 

Coke, refined petroleum products and nuclear fuel 

Chemicals 

Chemicals and chemical products 

Rubber 

Rubber and plastics products 

Other mineral 

Other non-metallic mineral products 

Basic metals 

Basic metals 

Fabricated metal 

Fabricated metal products except machinery and equipment 

Machinery 

Machinery and equipment n.e.c 

Office machinery 

Office, accounting and computing machinery 

E-machinety 

Electrical machinery and apparatus n.e.c 

Communication eq. 

Radio, television and communication equipment 

Med. instruments 

Medical, precision and optical instruments 

Motor vehicles 

Motor vehicles, trailers and semi-trailers 

0th. trans eq. 

Other transport equipment 

Manu. n.e.c 

Manufacturing n.e.c; recycling 

Electricity 

Electricity, gas and water supply 

Construction 

Construction 

Trade 

Wholesale and retail trade; repairs 

Hotels 

Hotels and restaurants 

Trans Sz storage 

Transport and storage 

Communications 

Post and telecommunications 

Finance 

Finance and insurance 

Real estate 

Real estate activities 

Renting 

Renting of machinery and equipment 

Computer 

Computer and related activities 

R&D 

Research and development 

Other Business 

Other Business Activities 

Public admin 

Public admin, and defence; compulsory social security 

Education 

Education 

Health & social work Health & social work 

Other services 

Other community, social and personal services 

Private households 

Private households with employed persons 



